Achieving High Output Quality under Limited Resources through Structure-based Spilling in XML Streams
نویسندگان
چکیده
Because of high volumes and unpredictable arrival rates, stream processing systems are not always able to keep up with input data resulting in buffer overflow and uncontrolled loss of data. To produce eventually complete results, load spilling, which pushes some fractions of data to disks temporarily, is commonly employed in relational stream engines. In this work, we now introduce “structurebased spilling”, a spilling technique customized for XML streams by considering the partial spillage of possibly complex XML elements. Such structure-based spilling brings new challenges. When a path is spilled, multiple paths may be affected. We analyze possible spilling effects on the query paths and how to execute the “reduced” query to produce partial results. To select the reduced query that maximizes output quality, we develop three optimization strategies, namely, OptR, OptPrune and ToX. We also examine the clean-up stage to guarantee that an entire result set is eventually generated by producing supplementary results. Our experimental study demonstrates that our proposed solutions consistently achieve higher quality results compared to the state-of-the-art techniques.
منابع مشابه
Continuously Providing Approximate Results under Limited Resources: Load Shedding and Spilling in XML Streams
Because of the high volume and unpredictable arrival rates, stream processing systems may not always be able to keep up with the input data streams, resulting in buffer overflow and uncontrolled loss of data. To continuously supply online results, two alternate solutions to tackle this problem of unpredictable failures of such overloaded systems can be identified. One technique, called load she...
متن کاملDistributed Resource Allocation for Synchronous Fork and Join Processing Networks (Tech Report)
Many emerging information processing applications require applying various fork and join type operations such as correlation, aggregation, and encoding/decoding to data streams in real-time. Each operation will require one or more simultaneous input data streams and produce one or more output streams, where the processing may shrink or expand the data rates upon completion. Multiple tasks can b...
متن کاملDistributed Resource Allocation for Synchronous Fork and Join Processing Networks (Technical Report)
Many emerging information processing applications require applying various fork and join type operations such as correlation, aggregation, and encoding/decoding to data streams in real-time. Each operation will require one or more simultaneous input data streams and produce one or more output streams, where the processing may shrink or expand the data rates upon completion. Multiple tasks can b...
متن کاملDelivering Qos in Xml Data Stream Processing Using Load Shedding
In recent years, we have witnessed the emergence of new types of systems that deal with large volumes of streaming data. Examples include financial data analysis on feeds of stock tickers, sensorbased environmental monitoring, network track monitoring and click stream analysis to push customized advertisements or intrusion detection. Traditional database management systems (DBMS), which are ver...
متن کاملMining Data Streams under Dynamicly Changing Resource Constraints
Due to the inherent characteristics of data streams, appropriate mining techniques heavily rely on window-based processing and/or (approximating) data summaries. Because resources such as memory and CPU time for maintaining such summaries are usually limited, the quality of the mining results is affected in different ways. Based on Frequent Itemset Mining and an according Change Detection as se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 3 شماره
صفحات -
تاریخ انتشار 2010